Content Determination in GRE: Evaluating the evaluator

نویسندگان

  • Kees van Deemter
  • Albert Gatt
چکیده

In this paper, we discuss the evaluation measures proposed in a number of recent papers associated with the TUNA project1, and which have become an important component of the First NLG Shared Task and Evaluation Campaign (STEC) on attribute selection for referring expressions generation. Focusing on reference to individual objects, we discuss what such evaluation measures should be expected to achieve, and what alternative measures merit consideration. The measures mentioned above can be motivated as follows. Suppose a large number of utterance situations had been defined, where each situation contained a number of objects, one of which needed to be described by a referring expression. Suppose, furthermore, one had an infallible oracle which told us, for each of these situations, what was the best referring expression for that situation. How could this oracle be used to evaluate the extent to which other referring expressions are similar to the one proposed by the oracle? In reality, an infallible oracle is not available of course. What is available is a large corpus in which sixty-odd human subjects make their best stab at each of the utterance situations. We acknowledge that evaluation measures could handle the unavoidable differences between human subjects in different ways: focussing on the average Dice score of an algorithm (over all subjects as well

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-invasive quantification of liver fat content by different Gradient Echo MRI sequences in patients with Non-Alcoholic Fatty Liver Disease (NAFLD)

Introduction: Non-invasive quantification of liver fat by Gradient echo (GRE) Technique is an interesting issue in quantitative MRI. Despite the numerous advantages of this technique, fat measurement maybe biased by confounding and effects. The aim of this study was to evaluate the GRE pulse sequences with different   and  weighting for liver fat quantification in patients with...

متن کامل

مقایسه روش‌های مختلف اندازه‌گیری رنگ و بافت در توده‌های گیاه چمنی مرغ Cynodon dactylon L. Pers.

Turfgrasses are the most important cover plants in the world. Quality evaluation of the turfgrasses is usually done by experienced evaluators using color texture, density and uniformity. The results obtained by different evaluators may be different, leading to researcher’s concern. Therefore, some quantitative methods have been used for increasing the aquracy and stability in results. In this s...

متن کامل

مقایسه روش‌های مختلف اندازه‌گیری رنگ و بافت در توده‌های گیاه چمنی مرغ Cynodon dactylon L. Pers.

Turfgrasses are the most important cover plants in the world. Quality evaluation of the turfgrasses is usually done by experienced evaluators using color texture, density and uniformity. The results obtained by different evaluators may be different, leading to researcher’s concern. Therefore, some quantitative methods have been used for increasing the aquracy and stability in results. In this s...

متن کامل

Evaluating the Evaluator: Towards understanding Feed-back, Feed-up, and Feed-forward of Moroccan Doctorate Supervisors’ Reports

Supervisor’s feedback is both a naysaying and a puzzling concern that has always tormented academics in higher education. Particularly, written feedback on pre-final or final versions of a submitted doctoral dissertation is indisputably the most significant step toward granting a doctoral student supervisee the right to defend his/her research project. It also constitutes a rich source on how s...

متن کامل

Using an Evaluator Fixed Structure Learning Automata in Sampling of Social Networks

Social networks are streaming, diverse and include a wide range of edges so that continuously evolves over time and formed by the activities among users (such as tweets, emails, etc.), where each activity among its users, adds an edge to the network graph. Despite their popularities, the dynamicity and large size of most social networks make it difficult or impossible to study the entire networ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007